WordRank: Learning Word Embeddings via Robust Ranking

نویسندگان

Shihao Ji

Hyokun Yun

Pinar Yanardag

Shin Matsushima

S. V. N. Vishwanathan

چکیده

Embedding words in a vector space has gained a lot of attention in recent years. While state-of-the-art methods provide efficient computation of word similarities via a low-dimensional matrix embedding, their motivation is often left unclear. In this paper, we argue that word embedding can be naturally viewed as a ranking problem due to the ranking nature of the evaluation metrics. Then, based on this insight, we propose a novel framework WordRank that efficiently estimates word representations via robust ranking, in which the attention mechanism and robustness to noise are readily achieved via the DCG-like ranking losses. The performance of WordRank is measured in word similarity and word analogy benchmarks, and the results are compared to the state-of-the-art word embedding techniques. Our algorithm is very competitive to the state-of-thearts on large corpora, while outperforms them by a significant margin when the training set is limited (i.e., sparse and noisy). With 17 million tokens, WordRank performs almost as well as existing methods using 7.2 billion tokens on a popular word similarity benchmark. Our multi-machine distributed implementation of WordRank is open sourced for general usage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Intensity Ranking among Adjectives Using Sentiment Bearing Word Embeddings

Identification of intensity ordering among polar (positive or negative) words which have the same semantics can lead to a finegrained sentiment analysis. For example, master, seasoned and familiar point to different intensity levels, though they all convey the same meaning (semantics), i.e., expertise: having a good knowledge of. In this paper, we propose a semisupervised technique that uses se...

متن کامل

Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints

In this paper, we propose a general framework to incorporate semantic knowledge into the popular data-driven learning process of word embeddings to improve the quality of them. Under this framework, we represent semantic knowledge as many ordinal ranking inequalities and formulate the learning of semantic word embeddings (SWE) as a constrained optimization problem, where the data-derived object...

متن کامل

Online Learning of Interpretable Word Embeddings

Word embeddings encode semantic meanings of words into low-dimension word vectors. In most word embeddings, one cannot interpret the meanings of specific dimensions of those word vectors. Nonnegative matrix factorization (NMF) has been proposed to learn interpretable word embeddings via non-negative constraints. However, NMF methods suffer from scale and memory issue because they have to mainta...

متن کامل

Improving Document Ranking with Dual Word Embeddings

This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the inp...

متن کامل

A Simple and Effective Unsupervised Word Segmentation Approach

In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our approach is a novel word induction criterion called WordRank, which estimates the goodness of word hypotheses (character or phoneme sequences). We devise a method to derive exterior word boundary information from the link structures of adjacent word hypotheses and incorporate interior word boundary...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

WordRank: Learning Word Embeddings via Robust Ranking

نویسندگان

چکیده

منابع مشابه

Sentiment Intensity Ranking among Adjectives Using Sentiment Bearing Word Embeddings

Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints

Online Learning of Interpretable Word Embeddings

Improving Document Ranking with Dual Word Embeddings

A Simple and Effective Unsupervised Word Segmentation Approach

عنوان ژورنال:

اشتراک گذاری